perm filename COMP1.WRU[206,JMC] blob sn#070522 filedate 1973-11-07 generic text, type T, neo UTF8
                     COMPUTER SCIENCE DEPARTMENT
                         STANFORD UNIVERSITY


CS 206           COMPUTING WITH SYMBOLIC EXPRESSIONS        FALL 1973


        ADDITIONAL INFORMATION ABOUT LISP COMPILING


1. Some facts about the PDP-10.

	The following facts about the PDP-10 may be of use.

	The  PDP-10  has  a  36  bit  word and an 18 bit address.  In
instructions and in accumulators used as index registers this is  the
right part of the word where the least significant bits in arithmetic
reside.

	There are 16 general registers which serve simultaneously  as
accumulators  (receiving the results of arithmetic operations), index
registers (modifying the nominal addresses of  instructions  to  form
effective addresses), and as the first 16 registers of memory (if the
effective address of  an  instruction  is  less  than  16,  then  the
instruction uses the corresponding general register as its operand.

	All instructions have the same format and are written for the
LAP assembly program in the form

	(<op name> <accumulator> <address> <index register>).

Thus %(MOVE 1 3 P)% causes accumulator %1% to receive the contents of
a  memory  register whose address is %3+c(P), i.e. 3+<the contents of
general register P.  If the op name is followed by %@,then the memory
reference is indirect, i.e. the effective addrress is the contents of
the right half of the word  directly  addressed  by  the  instruction
(modified  by  the index register and indirect bit of that word).  In
the following description of some instructions useful in constructing
the compiler, <ef> denotes the effective address of an instruction.

MOVE			c(ac) ← c(<ef>)
MOVEI			c(ac) ← <ef>
MOVEM			c(<ef>) ← c(ac)
HLRZ			c(left half ac) ← right half of c(<ef>)
HRRZ			c(right half ac) ←c(right half of c(<ef>)
	(These two are used indirectly for %car% and %cdr% respectively.)
ADD			c(ac) ← c(ac) + c(<ef>)
SUB			c(ac) ← c(ac) - c(<ef>)
JRST			go to <ef>
JUMPE			if c(ac) = 0 then go to <ef>
JUMPN			if c(ac) ≠ 0 then go to <ef>
CAME			if c(ac) = c(<ef>) then <skip next
CAMN			if c(ac) ≠ 0 then <skip next instruction>
PUSH			c(c(right half of ac)) ← c(<ef)); the contents
			of each half of ac is increased by one and if
			the contents of the left half is then %0, a stack
			overflow interrupt occurs.  (PUSH P ac) is
			is used in LISP to put c(ac) on the 
			stack.
POP			c(<ef>) ←c(c(right half of ac)); the
			contents of each half of ac are then decreased
			by %1.  This may be used for removing the
			top element of the stack.  Thus (POP P 1) puts
			the top element of the stack in accumulator 1.
			The i-th element of the stack is obtained by
			(MOVE@ 1 i P).
POPJ			(POPJ P) is used for returning from a subroutine

	These instructions are adequate for compiling basic LISP code
with  the addition of the subroutine calling pseudo-instrucion. (CALL
n (E <subr)) is used for calling the LISP subroutine <subr> with  %n%
arguments.   The  convention  is that the arguments will be stored in
successive accumulators beginning with accumulator %1, and the result
will  be  reurned  in  accumulator  %1.   In particular the functions
%ATOM% and %CONS% are called with %(CALL 1 (E ATOM))% and (CALL 2  (E
CONS))  respectively,  and  programs produced by you compiler will be
called similarly when their names are referred to.

2. Code produced by LISP compilers.

	I have written two compilers, the  simple  one  called  LCOM0
and more optimising compiler called LCOM4.
Currently, LCOM4 produces about  half  as
many  instructions for a given function as does LCOM0. Besides these,
there is the standard PDP-10  LISP  compiler  written  at  M.I.T.  by
Richard  Greenblatt  and  Stuart  Nelson  and  modified  by Whitfield
Diffie.

	Consider the function %union% for giving  the  union  of  two
unordered lists.  We have

	union(u,v) ← if n u then v else if a u ε v then
		union(d u,v) else a u . union(d u,v).

Its LISP definition is

(DE UNION (U V) (COND ((NULL U) V) ((MEMBER (CAR U) V)
	(UNION (CDR U) V)) (T (CONS (CAR U) (UNION (CDR U) V)))))

Here are the programs produced by the three compilers.

The basic compiler LCOM0 generates the following 46
instructions:

(LAP UNION SUBR) 
(PUSH P 1) 
(PUSH P 2) 
(MOVE 1 -1 P) 
(JUMPN 1 G0157) 
(MOVE 1 0 P) 
(JRST G0156) 
G0157 
(MOVE 1 -1 P) 
(HLRZ@ 1 1) 
(PUSH P 1) 
(MOVE 1 -1 P) 
(PUSH P 1) 
(SUB P (C 0 0 2 2)) 
(MOVE 2 2 P) 
(MOVE 1 1 P) 
(CALL 2 (E MEMBER)) 
(JUMPE 1 G0158) 
(MOVE 1 -1 P) 
(HRRZ@ 1 1) 
(PUSH P 1) 
(MOVE 1 -1 P) 
(PUSH P 1) 
(SUB P (C 0 0 2 2)) 
(MOVE 2 2 P) 
(MOVE 1 1 P) 
(CALL 2 (E UNION)) 
(JRST G0156) 
G0158 
(MOVE 1 -1 P) 
(HLRZ@ 1 1) 
(PUSH P 1) 
(MOVE 1 -2 P) 
(HRRZ@ 1 1) 
(PUSH P 1) 
(MOVE 1 -2 P) 
(PUSH P 1) 
(SUB P (C 0 0 2 2)) 
(MOVE 2 2 P) 
(MOVE 1 1 P) 
(CALL 2 (E UNION)) 
(PUSH P 1) 
(SUB P (C 0 0 2 2)) 
(MOVE 2 2 P) 
(MOVE 1 1 P) 
(CALL 2 (E CONS)) 
(JRST G0156) 
G0159 
G0156 
(SUB P (C 0 0 2 2)) 
(POPJ P) 
NIL 

	LCOM4 produces the following program:

(LAP UNION SUBR) 
(PUSH P 1) 
(PUSH P 2) 
(MOVE 1 -1 P) 
(JUMPN 1 G0157) 
(MOVE 1 0 P) 
(JRST 0 G0156) 
G0157 
(HLRZ@ 1 -1 P) 
(MOVE 2 0 P) 
(CALL 2 (E MEMBER)) 
(JUMPE 1 G0158) 
(HRRZ@ 1 -1 P) 
(MOVE 2 0 P) 
(CALL 2 (E UNION)) 
(JRST 0 G0156) 
G0158 
(HRRZ@ 1 -1 P) 
(MOVE 2 0 P) 
(CALL 2 (E UNION)) 
(MOVE 2 1) 
(HLRZ@ 1 -1 P) 
(CALL 2 (E CONS)) 
G0156 
(SUB P (C 0 0 2 2)) 
(POPJ P) 
NIL 

	The  standard  PDP-10  LISP  compiler  produces the following
code:

(LAP UNION SUBR)  
	(PUSH P 1)  
	(PUSH P 2)  
	(JUMPN 1 G0002)  
	(MOVE 1 2)  
	(JRST 0 G0001)  
G0002 	(HLRZ@ 1 1)  
	(CALL 2 (E MEMBER))  
	(JUMPE 1 G0003)  
	(MOVE 2 0 P)  
	(HRRZ@ 1 -1 P)  
	(CALL 2 (E UNION))  
	(JRST 0 G0001)  
G0003 	(HLRZ@ 1 -1 P)  
	(MOVE 2 0 P)  
	(PUSH P 1)  
	(HRRZ@ 1 -2 P)  
	(CALL 2 (E UNION))  
	(POP P 2)  
	(CALL 2 (E XCONS))  
G0008  
G0001 	(SUB P (C 0 0 2 2))  
	(POPJ P)  
	NIL  
 

	Note  that all three compilers produce the same first line of
heading.  This is necessary for the LAP assembly program  which  also
requires  the %NIL% at the end as punctuation.  The standard compiler
uses a function called %XCONS% that has  the  same  effect  as  CONS%
except  that  it receives its arguments in the reverse order which is
sometimes convenient.  This requires, of course, that the compiler be
smart  enough  to  decide  where  it  is  more  convenient to put the
arguments of the %cons% function.

	My  two  compilers  are  similar in structure, but the better
one, which is twice as long, uses the following optimisations.

	1. When the argument of CAR or CDR is a variable, it compiles
a  (HLRZ@  1  i P) or (HRRZ@ 1 i P) which gets the result through the
stack without first compiling the argument into an accumulator.

	2. When it has to set up the arguments of a function  in  the
accumulators,  in general, the program must compute the arguments one
at a time and save them on the stack, and then load the  accumulators
from the stack.  However, if one of the arguments is a variable, is a
quoted expression, or can be obtained from a variable by a  chain  of
%cars%  and  %cdrs,  then  it  need not be computed until the time of
loading  accumulators  since  it  can  be  computed  using  only  the
accumulator in which it is wanted.

	3.  The  Diffy  compiler  produces about 10 percent less code
than LCOM4; the main difference seems to be that it sometimes notices
when  it  has  an  expression that it will need later. Sometimes this
feature leads it astray as in the above example wherein  saves  %a%u%
for later use at greater cost than re-computing it.

	It  should  further be noted that quoted S-expressions can be
compiled with the instruction

	(MOVEI 1 (QUOTE α)).

3. Running the compilers.

	If you want to run one of these compilers, say LCOM4, to  see
what sort of code it produces, here is the procedure:

	1. Prepare a file <file name> without an extension containing
the functions to be compiled in  LISP  S-expression  form,  i.e.  (DE
<function name> <variable list> <function body>).

	2. Give the system command RU LCOM4[206,JMC]<cr>.  LCOM4 will
reply with *.

	3. Give the RLISP command COMPL <file name>;<cr>.   COMPL  is
an  FEXPR  so  that  <file  name>  does not have to be preceded by @.
LCOM4 will respond with a list of  the  functions  compiled  and  the
lengths  of  the  compiled  programs  exaggerating slightly since all
expressions in the output list are counted. <control C> will get  you
back to the system and a file called <file name>.LAP has the compiled
programs.

	4. You can look at the program by TYPE  <file  name>.LAP<cr>,
and  if  you  want  to  try out the functions you can enter LISP by R
LISP<cr> and after answering N to the ALLOC? question, you  can  read
in  the  functions  by (DSKIN (<file name>.LAP))<cr>, and they can be
tested in the usual LISP way.